1) validity
Zahra Fayaz Validity Introduction ' The primary concern in test development and use is demonstrating not only that test scores are reliable, but that the interpretations and uses we make of test scores are valid. In validity , we consider the relationships between test performance and other types of performance in other contexts. The types of performance and contexts we select for investigation will be determined by the uses or interpretations we wish to make of the test results. Furthermore, since the uses we make of test scores inevitably involve value judgments, and have both educational and societal consequences, we must also carefully examine the value systems that justify a given use of test scores. Messick (1989), describes validity as ‘an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores’ Validity.. . is a unitary concept. Although evidence may be accumulated in many ways, validity always refers to the degree to which that evidence supports the inferences that are made from the scores. The inferences regarding specific uses of a test are validated, not the test itself. (American Psychological Association 1985: 9) The examination of validity has also traditionally focused on the types of evidence that need to be gathered to support a particular meaning or use. Given the significant role that testing now plays in influencing educational and social decisions about individuals, however, we can no longer limit our investigation of -validity to collecting factual evidence to support a given interpretation or use. Since testing takes place in an educational or social context, we must also consider the educational and social consequences of the uses we make of tests. Examining the validity of a given use of test scores is therefore a complex process that must involve the examination of both the evidence that supports that interpretation or use and the ethical values that provide the basis or justification for that interpretation or use (Messick 1975,1980,1989). Although we often speak of a given test’s validity this is misleading, because validity is not simply a function of the content and procedures of the test itself. '''Now, What is Validity? ' Validity is the extent to which a test measures what it claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted. Validity isn’t determined by a single statistic, but by a body of research that demonstrates the relationship between the test and the behavior it is intended to measure. Test validity concerns the test and assessment procedures used in psychological and educational testing, and the extent to which these measure what they purport to measure. “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests.”1 Although classical models divided the concept into various "validities" (such as content validity, criterion validity, and construct validity),2 the currently dominant view is that validity is a single unitary construct.3 Validity is generally considered the most important issue in psychological and educational testing4 because it concerns the meaning placed on test results.3 Though many textbooks present validity as a static construct,5 various models of validity have evolved since the first published recommendations for constructing psychological and education tests.6 These models can be categorized into two primary groups: classical models, which include several types of validity, and modern models, which present validity as a single construct. The modern models reorganize classical "validities" into either "aspects" of validity3 or types of validity-supporting evidence 'Validation process ' According to the 1999 Standards,1 validation is the process of gathering evidence to provide “a sound scientific basis” for interpreting the scores as proposed by the test developer and/or the test user. Validation therefore begins with a framework that defines the scope and aspects (in the case of multi-dimensional scales) of the proposed interpretation. The framework also includes a rational justification linking the interpretation to the test in question. Validity researchers then list a series of propositions that must be met if the interpretation is to be valid. Or, conversely, they may compile a list of issues that may threaten the validity of the interpretations. In either case the researchers proceed by gathering evidence – be it original empirical research, meta-analysis or review of existing literature, or logical analysis of the issues – to support or to question the interpretation’s propositions (or the threats to the interpretation’s validity). Emphasis is placed on quality, rather than quantity, of the evidence. A single interpretation of any test may require several propositions to be true (or may be questioned by any one of a set of threats to its validity). Strong evidence in support of a single proposition does not lessen the requirement to support the other propositions. Evidence to support (or question) the validity of an interpretation can be categorized into one of five categories: 1.Evidence based on test content 2.Evidence based on response processes 3.Evidence based on internal structure 4.Evidence based on relations to other variables 5.Evidence based on consequences of testing Techniques to gather each type of evidence should only be employed when they yield information that would support or question the propositions required for the interpretation in question. Each piece of evidence is finally integrated into a validity argument. The argument may call for a revision to the test, its administration protocol, or the theoretical constructs underlying the interpretations. If the test and/or the interpretations meant to be made of the test’s results are revised in any way, a new validation process must gather evidence to support the new version. in test validation we are not examining the validity of the test content or of even the test scores themselves, but rather the validity of the way we interpret or use the information gathered through the testing procedure. To refer to a test or test score as valid, without reference to the specific ability or abilities the test is designed to measure and the uses for which the test is intended, is therefore more than a terminological inaccuracy. 'Types of validity: ' There are three types of validity : 'A)Content validity: ' When a test has content validity, the items on the test represent the entire range of possible items the test should cover. Individual test questions may be drawn from a large pool of items that cover a broad range of topics. In some instances where a test measures a trait that is difficult to define, an expert judge may rate each item’s relevance. Because each judge is basing their rating on opinion, two independent judges rate the test separately. Items that are rated as strongly relevant by both judges will be included in the final test. 'B)Criterion-related Validity: ' A test is said to have criterion-related validity when the test has demonstrated its effectiveness in predicting criterion or indicators of a construct. There are two different types of criterion validity: •'''Concurrent Validity occurs when the criterion measures are obtained at the same time as the test scores. This indicates the extent to which the test scores accurately estimate an individual’s current state with regards to the criterion. For example, on a test that measures levels of depression, the test would be said to have concurrent validity if it measured the current levels of depression experienced by the test taker. •'Predictive Validity' occurs when the criterion measures are obtained at a time after the test. Examples of test with predictive validity are career or aptitude tests, which are helpful in determining who is likely to succeed or fail in certain subjects or occupations. C) Construct Validity: ' A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity. ' ' '''Validity as a unitary concept ' Although validity has traditionally been discussed in terms of different types, as pointed out above, psychometricians have increasingly come to view it as a single, unitary concept. Messick (1980, 1988b) has argued that even viewing different approaches to validation (content, criterion-related, construct) as separate lines of evidence for supporting given score interpretations is inadequate, and that the consideration of values and consequences of score use has an essential role in validity considerations. '''The evidential basis of validity In defining validity as ‘the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores’ (American Psychological Association 1985: 9), the measurement profession has clearly linked validity to the inferences that are made on the basis of test scores. The process of validation, therefore, ‘starts with the inferences that are drawn and the uses that are made of scores. . .these uses and inferences dictate the kinds of evidence and logical arguments that are required to support judgments regarding validity’ (Linn 1980: 548). Judging the extent to which an interpretation or use of a given test score is valid thus requires the collection of evidence supporting the relationship between the test score and an interpretation or use. Messick (1980, 1989) refers to this as the ‘evidential’ basis for test interpretation and use. The evidence that we collect in support of a particular test use can be grouped into three general types: content relevance, criterion relatedness, and meaningfulness of construct. And while these have typically been discussed as different kinds of validity (content, criterion, and construct), they can be more appropriately viewed as complementary types of evidence that must be gathered in the process of validation. Content relevance and content coverage (content validity) One of the first characteristics of 2 test that we, as prospective test users, examine is its content. If we cannot examine an actual copy of the test, we would generally like to see a table of specifications and example items, or at least a listing of the content areas covered, and the number of items, or relative importance of each area. Likewise, in developing a test, we begin with a definition of the content or ability domain, or at the very least, with a list of content areas, from which we generate items, or test tasks. The consideration of test content is thus an important part of both test development and test use. Demonstrating that a test is relevant to and covers a given area of content or ability is therefore a necessary part of validation. There are two aspects to this part of validation: content relevance and content coverage. The investigation of content relevance require ‘the specification of the behavioral domain in question and the attendant specification of the task or test domain’ (Messick 1980: 1017). While it is generally recognized that this involves the specification of the ability domain, what is often ignored is that examining content relevance also requires the specification of the test method facets. The domain specification that is necessary for examining content relevance is essentially the process of operationally defining constructs, which was discussed in Chapter 2 (pp.42-4). The importance of also specifying the test method facets that define the measurement procedures is clear from Cronbach’s description of validation: a validation study examines the procedure as a whole. Every aspect of the setting‘in which the test is given and every detail of the procedure may have an influence on performance and hence on what is measured. Are the examiner’s sex, status, and ethnic group the same as those of the examinee? Does he put the examinee at ease? Does he suggest that the test will affect the examinee’s future, or does he explain that he is merely checking out the effectiveness of the instructional method? Changes in procedure such as these lead to substantial changes in ability- and personality-test performance, and hence in the appropriate interpretation of test scores. . . . The measurement procedure being validated needs to be described with such clarity that other investigators could reproduce the significant aspects of the procedure themselves. (Cronbach 1971: 449) 'Summary ' The most important quality to consider in the development, interpretation, and use of language tests is validity, which has been described as a unitary concept related to the adequacy and appropriateness of the way we interpret and use test scores. While validity is the most important quality of test use, reliability is a necessary condition for validity, in the sense that test scores that are not reliable cannot provide a basis for valid interpretation and use. In examining reliability we must identify potential sources of measurement error and estimate the magnitude of their effects on test scores. In investigating validity, on the other hand, we examine the extent to which factors other than the language abilities we want to measure affect performance on test scores. The process of validation is a continuous one, involving both logical analysis and empirical investigation. Evidence for the validity of test interpretations is of several types, and can be gathered in a number of ways. 'Reference: ' 1.Bachman, L. F. (1996). Fundamental consideration in language testing,Bioliohraphy,p.361-394 2.Guion, R. M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11, 385-398. 3.a b c d Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. 4. Popham, W. J. (2008). All About Assessment / A Misunderstood Grail. Educational Leadership, 66(1), 82-83. 5.See the otherwise excellent text: Nitko, J.J., Brookhart, S. M. (2004). Educational assessment of students. Upper Saddle River, NJ: Merrill-Prentice Hall. 6.a b American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1954). Technical recommendations for psychological tests and diagnostic techniques. Washington, DC: The Association. 7.Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. Braun (Eds.), Test Validity (pp. 19-32). Hillsdale, NJ: Lawrence Erlbaum. 8.Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302. 9.Cronbach, L. J. (1969). Validation of educational measures. Proceedings of the 1969 Invitational Conference on Testing Problems. Princeton, NJ: Educational Testing Service, 35-52.